Generative Modeling for Maximizing Precision and Recall in Information Visualization
نویسندگان
چکیده
Information visualization has recently been formulated as an information retrieval problem, where the goal is to find similar data points based on the visualized nonlinear projection, and the visualization is optimized to maximize a compromise between (smoothed) precision and recall. We turn the visualization into a generative modeling task where a simple user model parameterized by the data coordinates is optimized, neighborhood relations are the observed data, and straightforward maximum likelihood estimation corresponds to Stochastic Neighbor Embedding (SNE). While SNE maximizes pure recall, adding a mixture component that “explains away” misses allows our generative model to focus on maximizing precision as well. The resulting model is a generative solution to maximizing tradeoffs between precision and recall. The model outperforms earlier models in terms of precision and recall and in external validation by unsupervised classification.
منابع مشابه
The Role of Visualization in EFL Learners’ Reading Comprehension and Recall of Short Stories
Generally speaking, lexical items that enter our minds through reading a text commonly leave us with pictures, sounds, echoes, and feelings in the mind. While the ability to produce images in the mind in the process of reading appears to be vital for greater comprehension and recall of texts, research has indicated that many poor readers seemingly do not visualize as they read. On the contrary,...
متن کاملOptimizing the Information Retrieval Trade-off in Data Visualization Using $\alpha$-Divergence
Data visualization is one of the major applications of nonlinear dimensionality reduction. From the information retrieval perspective, the quality of a visualization can be evaluated by considering the extent that the neighborhood relation of each data point is maintained while the number of unrelated points that are retrieved is minimized. This property can be quantified as a trade-off between...
متن کاملThe Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...
متن کاملThe Relative generality and precision of Evidence Based Medical Infor-mation Resources in the Recovery of Diabetes Information
Background and Aim: Relative generality and precision are two important criteria for measuring the efficiency and performance of information retrieval systems. The aim of this study was to compare the integrity and location of evidence-based bases in the digital library of Hamedan University of Medical Sciences in data retrieval of diabetes. Methods: The design of this research is cross-sect...
متن کاملDimensionality Reduction for Data Visualization
Dimensionality reduction is one of the basic operations in the toolbox of data-analysts and designers of machine learning and pattern recognition systems. Given a large set of measured variables but few observations, an obvious idea is to reduce the degrees of freedom in the measurements by representing them with a smaller set of more “condensed” variables. Another reason for reducing the dimen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011